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to  obtain  a  twisted  cube.  Gray  and  Kamin  report  simulation  results  to 
show  that  these  quantizers  are  superior.  We  did  a  straightforward  (but 
very  expensive)  numerical  integration  to  get  the  following  results  listed 
below.  Notice  that  the  best  of  these  three  does  not  have  a  representative 
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SELECTING  REPRESENTATIVE  POINTS 


IN  NORMAL  POPULATIONS 
by 

S.  Iyengar  and  H.  Solomon 

The  representation  of  a  continuous  random  variable  by  several 
discrete  points  occurs  often  in  applied  probability  problems.  Quanti¬ 
zation  is  the  term  applied  to  this  procedure  and  optimal  quantizers 
have  been  sought  by  a  number  of  investigators.  This  requires  defining 

suitable  measures  for  the  error  inherent  in  the  procedure  and  then 

constructing  quantizing  procedures  that  minimize  the  expected  error. 

While  the  problems  that  motivate  quantization  are  far  ranging,  the  mathe- 
matization  leading  to  solutions  is  essentially  always  the  same. 

Most  efforts  are  devoted  to  one  dimensional  random  variables. 

Obviously  two,  three,  and  higher  dimensional  variables  can  lead  to  more 
intractability  but  in  this  paper  we  will  explore  some  special  cases  in  two 

and  three  dimensions.  The  loss  function  we  employ  is  that  of  mean  square 

error.  Zador  [1963,1982]  explored  the  multivariate  normal  random  vari¬ 
able  and  its  quantization  by  a  random  choice  of  representative  points.  He 
does  not  restrict  himself  to  mean  square  error;  rather,  he  defines  error 
as  the  s*^  power  of  the  distance  between  the  random  variable  and  its 
quantization  and  derives  results  about  its  asymptotic  properties. 

The  IEEE  very  recently  published  a  special  issue  on  the  topic  of 
quantization  [1982]  that  collected  quite  recent  work  and  work  by  Zador 
and  other  investigators  reported  but  not  published  as  much  as  25  years  ago. 
The  papers  in  the  issue  arise  out  of  an  electrical  engineering  and  informat 
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theory  framework  and  Ignore  the  efforts  of  workers  in  other  disciplines 
who  in  turn  are  unaware  of  the  work  of  these  authors. 

For  the  one-dimensional  normal  random  variable  situation,  there 
are  papers  by  Bofinger  [1970]  who  studied  the  question  of  grouping  a 
continuous  bivariate  normal  by  selecting  intervals  on  the  marginals 
that  would  provide  the  maximum  possible  correlation  between  the  mar¬ 
ginal  variables  and  by  Sitgreaves  [1961]  who  arrived  at  the  same  bivariate 
model  as  Bofinger  in  connection  with  a  psychometric  query  on  optimal  test 
items  for  an  achievement  test.  In  each  case,  the  univariate  normal  is 
quantized  in  an  optimal  manner.  Maximizing  the  correlation  is  equivalent 
to  minimizing  mean  square  error  in  those  cases.  Previous  workers  also  are 
Cox  [1957],  and  Anderberg  [1973],  each  of  whom  seeks  to  sectionalize  or 
quantize  the  univariate  normal  for  subsequent  data  analysis.  Recently  Fang 
and  He  [1982],  motivated  by  clothing  size  category  representations  provide 

a  detailed  analysis  for  the  univariate  normal  and  give  tables  of  represen¬ 
tative  points  for  N  =  1,2,3, ... ,31.  In  an  earlier  paper  in  the  electrical 
engineering  literature.  Max  [1960]  gives  representative  points  for 
S=l,2,3,...,26.  When  tabled  values  of  optimal  representative  points 
and  interval  endpoints  are  listed  there  is  consensus  among  the  inves¬ 
tigators  where  values  can  be  compared. 

A  rather  early  paper  on  quantization  is  by  Steinhaus  [1956].  In 
that  paper,  he  demonstrates  the  two  necessary  (but  not  sufficient)  condi¬ 
tions  for  optimal  quantization,  namely,  that  the  optimal  representative 
points  are  given  by  q^  =  E(x[XeQ^)  when  mean  squared  error  is  the  loss 
function;  and  the  optimal  regions  are  nearest  neighbor  regions,  namely 

Q  =-  (x:  |x-q  |  £  min  |x-q  \)  . 

l<i<j  J 


Let  us  look  into  the  computation  of  the  optimal  quantization  of  a 
continuous  random  variable,  X.  That  is,  we  divide  the  real  line  into 
N  disjoint  intervals  pick  a  representative  point  q^e Q^,  and 

define  Q(X)  =  q^  whenever  XeQ^.  The  loss  of  information  is  indicated 

•y  2 

by,  say  (X-Q(X))  ,  and  we  wish  to  minimize  E(X-Q(X))  by  choosing 

N  V 

(Q±}‘1  and  {q^}^  appropriately.  We  now  describe  and  compare  the 
methods  proposed  by  Lloyd  [1957,1982]  and  Zador  [1963,1982],  and  our 
modification  of  Zador' s  method. 

Lloyd  notes  that  given  the  intervals  the  optimal  represen¬ 

tative  points  are  given  by  the  centroids  q±  =  E(x|XeQi).  This  is,  of  course, 
a  consequence  of  the  fact  that  we  use  mean  squared  error;  if,  instead 

we  used  e|x-Q(X);  as  our  criterion,  the  optimal  would  be  given 

r  lN 

by  the  conditional  median.  He  also  notes  that  given  lq^}^,  the 

optimal  intervals  are  just  the  nearest  neighbor  regions,  Qi =  {x:|x-q.J  £ 

min  |x-q.!}.  We  have  already  noted  that  Steinhaus  lists  these  two 
l<i<l  2 

conditions. 

These  two  necessary  conditions  for  an  optimal  quantization  suggest 
an  iterative  procedure.  In  particular,  we  start  with  points  {q^} 
and  define  the  corresponding  optimal  intervals  {Q^};  then  we  let 
q£2^  =  E(x|xeQ*1} ),....  We  repeat  this  procedure  until  we  have  conver¬ 
gence.  One  important  question,  then,  is  when  does  the  procedure  converge? 

Lloyd  presents  a  simple  example  to  show  that  when  the  density  of  X  is 
bimodal,  then  the  iterative  procedure  may  converge  to  a  local  minimum 
and  not  a  global  one.  Kieffer  [1982]  has  the  following  positive  result: 
if  the  density  of  X  is  log-concave  which  is  not  piecewise  affine  on  1R » 
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then  this  iterative  procedure  converges  to  the  unique  optimal  points  at 
an  exponential  rate;  that  is,  if  then  ||q^-q*||  <  aN 

for  all  large  N  where  q*  is  the  optimal  quantization  and  0  <a<  1. 
When  X  N(0,1),  Lloyd  gives  a  table  of  optimal  points  for  N=  2,4,8,16 
and  the  corresponding  mean  square  errors.  Our  experience  has  shown  that 
the  initial  points  should  be  chosen  symmetrically  about  zero,  else  the 
procedure  converges  much  more  slowly.  For  a  table  of  the  optimal  represen¬ 
tation  points  and  errors,  see  the  papers  by  Lloyd,  Max,  and  Fang  and  He. 

For  future  reference  we  write  out  the  mean  squared  error  when 
X  'v*  N(0,1).  Let  6(x)  and  S(x)  be  the  standard  normal  density  and 
distribution  functions,  respectively,  and  assume  that  q^  <  q2 <  **'  <  qx* 
Then 


E(X-Q(X) ) 2  =  1-2EXQ(X)  +  EQ(X)2 


ql+q2  N^1 

l+2q1$(^y^)+2  J  q.vSO  2 


q-»"^q  ,-j.i  q.i~q.i  1 

1  1+1)  -  <}>(— -y1  *)) 


„  ,,qN-l+qN.  .  2„,ql+q2,  .  2A/qN+qX-lv 

2qN<t( - 2 - )  +  ql$(~2~^  +  - 2 - ) 


N-l 

+  l  q^C 

2 


2  r*,qi+qi+l 


qi+qi-l  , 

)  -  so-1- 1  )}  . 


Zador  proposes  a  random  quantizer:  the  q^  are  chosen  randomly 


according  to  some  density  g.  The  mean  squared  error  is  then  a  random 
variable  and  the  problem  now  is  to  choose  g  so  that  some  aspect  of  the 
distribution  of  the  mean  squared  error  is  optimized.  Zador  shows  that 


E 


2 

N  times  the  mean  squared  error  has  a  limiting  distribution  and  he 

9  j 

computes  the  mean  of  this  limit;  N  (MSE  )  ->■  Z  and  EZ  =  u  • 

8  ,  8  ,  8  8 
1/3  1/3 

He  then  shows  that  by  choosing  g(x)  =  $  (x)//$  (t)dt,  y  is 

8 

minimized.  It  is  clear  that  the  optimality  criterion  chosen  by  Zador 
is  quite  distinct  from  Lloyd's  criterion.  Random  quantization  in  one 
dimension  is  not  a  crucial  issue.  This  is  because  optimal  quantization 
typically  involves  only  one-dimensional  integrals  x^hose  computation  is 
efficient.  Optimal  quantization  in  higher  dimensions,  though,  rapidly 
becomes  expensive,  and  it  is  here  that  random  quantization  could  be 
valuable.  However,  since  we  know  many  results  for  the  one-dimensional 
case,  it  is  of  interest  to  see  how  random  quantization  compares  in  that 
case.  In  the  one-dimensional  case,  we  now  propose  several  improvements 
over  Zador' s  scheme. 

First,  choosing  the  asymptotically  optimal  g(x)  =  (1//3) <j>(x//T) 

may  not  do  well  for  finite  N.  In  fact,  we  shall  show  that  in  one 

1  X 

case,  choosing  gQ(x)  =  ^  4>(~)  with  a  =  .545  yields  substantial  improve¬ 
ment.  Second,  it  seems  intuitively  clear  that  the  optimal  points  {q^} 
ought  to  be  located  symmetrically  about  zero.  Zador 's  scheme  does  not 
guarantee  this  symmetry,  but  it  is  fairly  easy  to  modify  it  to  do  so. 

To  illustrate  these  modifications,  we  do  the  analytically  tractable 
cases,  N=2  and  N=3. 

When  N=2,  assuming  q^  <  q^,  we  have  that 


e(x-q(x) ) 2 


2  qx +q2  2  qi+q2  qi+q2 

l+q^$(~2  /  +  H2<K-  -  2(q2-qiH<— 2  + 


i-2(q2-q1)4>(--1~2  -2)  +  q2  -  (q2-q1)  (q2+q1)‘*,^';L2 


qi+q2 


) 

)  . 
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Under  Zador’s  scheme,  we  generate  Yj^  i.i.d.  N(0,a2)  and  set 

=  Y(i)»  ^2  1=1  Y(2)‘  Then  ran8e  (q2~qi)  is  clearly  independent 
of  the  mean  (q1+q2)/2.  Now  let  be  i-i-d-  N(0,1).  Then 


(i)  q 2  =  0X(2)  =>Eq2  =  02EX22)  =  2a2  /  t2$(t)<J(t)dt  -  O2 


E(q2-q1)  =  aEjx2-X1j  =  la/Sti  . 


(iii)  Ej> (---■■  -)  =  E<J>(—  X)  =  1//tt(2-Kj^)  . 
2  Jl 


E(q  +q 9)a>(^~-)  =  2E  —  X  0(—  X)  = 
1  Z  Jl  Ji 


/tt(240  ; 


Collecting  terns,  we  see  that  under  Zador's  scheme,  MSE(o) =  E(X-Q(X))  = 

2  2  a  /  ? 

1  +  a  -  —  /  240*-  .  The  asymptotically  optimal  a  is  3,  for  which 

the  mean  square  error  is  1.53.  Straightforward  numerical  work  shows, 

however,  that  MSE(a)  is  minimized  for  0  =  .545  and  MSE(.545)  =  .77, 

min 

which  is  almost  half  of  MSE(/3). 

It  is  also  of  interest  to  know  if  we  get  much  improvement  if  we  force 

the  random  points  to  be  symmetric  about  the  origin.  In  this  case  we 

generate  Y  'v#  N(O,02)  and  let  q^  =  —  | Y [  ,  q2  =  | Y J .  For  this  situation,  the 

2  2  ^ 

mean  squared  error  is  E(l+Y  -4  |  Y  |  // 2 Tr )  =  140  -  —  0,  which  is  minimized 
2  4 

for  0  =  —  and  the  minimum  value  is  1 - tt  =  .59.  A  similar  computation 

it  i 

IT 

for  N=3  (in  which  case  q^  =  — | Y | ,  q2  =  0,  and  q^  =  | Y | )  shows  that 
the  mean  square  error  assumes  a  minimum  value  of  .41  for  a  =  1.32. 


We  thus  have  the  following  short  table: 


Mean  Squared  Error 

N 
2 
3 

*:  Simulation 

The  results  for  N  ^  4  are  analytically  intractable.  Also,  the  .86 
entry  above  is  a  simulation  result  while  all  others  are  exact.  The 
optimal  results  for  N=2,  N=3  are  known.  Gray  and  Karnin  [198-1]. 


2  2 

Optimal  Zador,  O  =3  Zador  o 


.3634 

.1902 


1.53 


.86 


0.77 


Symmetrized 

0.59 

0.41 


Before  we  turn  to  higher  dimensional  problems,  we  describe  some 
difficulties  in  the  one  dimensional  normal  case.  One  might  naively  expect 
that  the  symmetry  of  the  distribution  requires  that  the  points  be  located 
symmetrically  about  zero.  The  conclusion  is  indeed  true  for  the  normal, 
but  the  reason  is  not  symmetry  alone.  To  illustrate  the  difficulties  here, 
we  consider  the  cases  N="2  and  N=3  for  an  arbitrary  symmetric  density  f 
When  N=2,  we  have,  say 


Qab(x)  = 


if 

if 


x 


X 


<  a+k 
-  2 

>  a+k 
-  2 


2 

and  we  seek  a  and  b  to  minimize  E(X-Q  ,  (X))  .  Differentiating  this 

ao 

expression  with  respect  to  a  and  b  and  setting  the  partial  derivatives 
equal  to  zero,  we  get 


(x-a)f(x)dx  ®  0 


r 

(x-1 

a+b 


b)f(x)dx  «  0  . 


If  we  let  h(a,b)  =*  (x-a)f(x)dx,  then  the  two  equations  can  be 

rewritten  as  h(a,b)  =  h(-b,-a)  =0.  We  can  now  say  that  if  (a  ,b  )  pro- 

'ft  "fc 

vides  an  optimal  quantization,  then  so  does  (-b  ,-a  ):  this  seems  to  be 
the  only  consequence  of  symmetry.  In  order  to  say  that  (a  ,b  )  lie  symme¬ 
trically  about  zero,  we  must  have  that  h(a,b)  -  0  has  a  unique  solution. 

One  simple  sufficient  condition  for  this  is  that  log  f  be  strictly  concave 
(see Fleischer  [1964],  Trushkin  [1982]).  We  conjecture  that  a  weaker  suffi¬ 
cient  condition  is  that  f  be  unlmodal  and  strictly  decreasing  from  the 
mode.  Notice  that  one  solution  is  always  a*  =  E(x|x<0)  and  b*=E(x|x>0) 

If  we  require  three  points,  one  might  invoke  a  symmetry  argument  to 
say  that  one  of  the  points  be  at  the  origin.  However,  the  following  intui¬ 
tively  clear  example  shows  that  this  is  not  the  case.  Consider  the  following 
class  of  symmetric  bimodal  densities: 


h  =>  —  -  Le 


c 

g 

1 

« 

c 

s 

I 

i 

'  I 

i 

I 

I 

I 

I 


If  e  is  very  small,  then  there  is  virtually  no  mass  between  -L  and 
!>.  Thus,  if  we  put  one  of  the  points  at  zero,  we  are  wasting  it.  If 
we  put  two  points  in  the  right  mode  and  one  In  the  left,  we  capture  much 
more  information  in  the  random  variable.  Of  course,  in  this  case,  we  can 
reflect  the  asymmetrical  quantization  without  changing  the  mean  square 
error.  We  omit  the  details  of  such  a  counterexample.  The  non-optimality 
of  the  symmetric  quantizer  is  a  feature  of  an  odd  number  of  point 

The  quantization  of  random  vectors  in  is  in  many  ways  s  :h 

more  difficult  problem  than  that  of  ordinary  random  variables.  I 
the  "simplest"  case  of  X  'v  N(0,I)  presents  many  difficulties,  as  we 
shall  see.  First  of  all,  whenever  the  random  variable  has  a  spherically 
symmetric  density,  any  quantization  can  be  rotated  without  changing  the 
mean  square  error.  More  precisely,  we  have  the  following  lemma,  which 
is  a  generalization  of  the  lemma  of  Gray  and  Karnin  [1982]. 

Lemma.  Suppose  X  has  density  f(x)  ■  g(x’x)  and  Q(X)  is  any 
quantizer.  Then  the  family  of  quantizers  {Qp(X)}p  where  Qp(X) = r*Q(rx) 
and  rT  -  I  have  the  same  mean  square  error. 

Proof .  Clearly  X  =  TX.  Thus, 

E]x-r*Q(rx)!2  =  Eir’(rx-Q(rx))|2  =  E|rx-Q(rx)]2  =  e!x-q(x)|2  . 

Thus,  we  should  not  consider  the  quantizations  Q(X)  and  I”Q(rX)  as 
distinct. 

As  in  the  one-dimensional  case,  any  quantization  has  two  components, 
the  subsets  of  TB?  and  the  representative  points  iq^}^  of 

each  subset.  Because  we  use  mean  square  error,  we  again  have  the 
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following  necessary  conditions  for  the  optimality  of  a  quantizer: 
(i)  the  representative  point  must  be  the  centroid  of  the 


respective  subset:  q^  *=  E(x|XeQ^). 

(ii)  Q  is  determined  by  the  nearest  neighbor  rule: 

Q±  "  !]x-q1H  £  min  Hx-q^U  }. 

It  is  clear  that  if  Q(X)  satisfies  (i)  and  (ii),  then  so  does 
r’Q(FX) .  Thus,  {F’Q(rx)}p  are  all  fixed  points  of  Lloyd’s  algorithm. 
This  is  not  the  real  source  of  the  difficulty,  however.  In  general, 
there  will  be  several  distinct  local  minima.  For  example  in  the 
normal  case,  if  d  =  2  and  N  =>  4,  Gray  and  Kamin  give  the  following 
configurations  which  are  fixed  points  of  Lloyd's  algorithm. 


I 


♦ 


i  j. 


Ill 


They  conjecture  that  these  three  are  the  only  fixed  points  of  the  algorithm 

1  2 

Standardizing  the  error  -  that  is,  considering  —  e||x-Q(X)||  ,  Gray  and 
Kamin  show  that  the  average  errors  are  .3634,  .5588,  and  .4102  for  I, 
II,  and  III  respectively.  We  call  configuration  1  the  product  quantizer 
since  it  is  the  Cartesian  product  of  the  optimal  one-dimensional 


quantizer  with  itself.  Quantizer  II  performs  rather  poorly.  Quantizer 
III  has  the  intuitive  appeal  that  one  point  is  located  at  the  origin, 
which  is  the  mode  of  the  distribution.  Gray  and  Karnin  comment  that  it 
"was  a  surprise  to  us  that  the  distortion  resulting  from  [code  III]  was 
so  much  larger  than  that  of  [code  Ij.” 

However,  the  intuition  that  says  that  one  representative  point 
should  be  at  the  origin  because  that  is  the  location  of  the  mode  of 
the  distribution  can  be  very  misleading.  Consider,  for  instance,  the 
problem  of  quantizing  X  ^  N2(0,I)  with  three  points.  Two  configura¬ 
tions  immediately  come  to  mind: 

b(-2.  2^ 


(0,0) 

(-y.o) 

(y,o) 

b(l,0) 


hf  i  Q 

b(  2,~  2) 


In  the  first  case,  the  mean  square  error  as  a  function  of  y  is 


MSE^y)  =  Ej|  X-Q(X)  |j2  *»  EjjxU2  +  E||Q(X)  ||2  -  2EX'Q(X) 

r  f°° 

=  2  +  2y2S>(-  U  xy4>(x)<i)(y)dxdy 

J-00  1 1  /  ? 


xy4>(x)<My)dxdy 


2  +  2y2$(-  £)  -4y<K#)  . 


To  find  the  p  that  minimizes  t’  -*s,  we  set  MSE^(p)  =0  to  get 
R(^-)  =  -jj,  where  R(x)  -  $(-x)/<Kx)  is  Mills'  ratio  (for  a  more 
thorough  discussion  of  Mills'  ratio,  see  Iyengar  [1982]).  Straight¬ 
forward  numerical  work  shows  that  the  minimizing  p  is  1.224  and 
that  the  average  noise  is  y  MSE^(1.224)  -  .5951. 

The  noise  for  the  second  configuration  is 
MSE(b)  =  2  +  b2  -  2EX'Q(X) 

=  2  +  b2  -2(3)  f  xb  4> (y) <*> (x) dydx 
)  0  )  -x/3 

=  2  +  b2  -  3v^3  <|>(0)b  . 

3^3 

In  this  case  our  optimizing  b  is  easily  seen  to  be  — —  <f>(0),  so  that 
the  average  mean  squared  error  is  1-27/16tt  -  .4629.  It  is  now  clear 
that  the  second  configuration  is  considerably  better  than  the  first  one, 
even  though  the  first  one  has  a  point  at  the  origin.  (Note  that  the  two 
configurations  do  satisfy  the  two  necessary  conditions  for  optimality; 
we  omit  a  formal  proof.) 

Quantization  of  a  standard  normal  in  three  dimensions  provides  some 
new  interesting  twists.  If  we  use  eight  points,  one  obvious  choice  is 
the  product  quantizer;  the  interesting  result  here  is  that  the  product 
quantizer  can  be  improved  upon.  Indeed,  Gray  and  Kamin  give  three 
different  configurations  that  beat  the  product  code.  In  one,  there  is 
a  point  at  the  origin,  two  points  lie  symmetrically  on  a  line  orthogonal 
to  a  plane  formed  by  a  pentagon  whose  vertices  are  the  other  five  points. 
Another  quantizer,  suggested  by  N.J.A.  Sloane  to  Gray  and  Karnin  [1982] 
was  obtained  by  rotating  the  top  square  of  the  product  quantizer  by  45° 


to  obtain  a  twisted  cube.  Gray  and  Karnin  report  simulation  results  to 
show  that  these  quantizers  are  superior.  We  did  a  straightforward  (but 
very  expensive)  numerical  integration  to  get  the  following  results  listed 
below.  Notice  that  the  best  of  these  three  does  not  have  a  representative 
point  at  the  origin. 


Mean  Square  Error 


Quantizer 


Simulation 
(Gray , Karnin) 


Numerical 

Integration 


Product 


.3635 


Pentagon-origin-poles 


.3590 


.3585 


True 


Twisted  cube 


.3573 


.3581 
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